This vignette assumes a SQL server at localhost (we
use PostgreSQL), with patient data in OMOP Common Data Model v5.4 format
in schema cdm_new_york3. The data shown in this example is
synthetic, generated by Synthea(TM) Patient
Generator.
library(phea)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# Connect to SQL server.
dbcon <- DBI::dbConnect(RPostgres::Postgres(),
host = 'localhost', port = 7654, dbname = 'fort',
user = cred$pg$user, password = cred$pg$pass)
# Call setup_phea so we can use sqlt() and sql0().
setup_phea(dbcon, 'cdm_new_york3')
How to find patients who had an acute myocardial infarction (AMI) when they were clinically obese (body mass index, BMI > 30)?
A patient can have multiple myocardial infarctions, and their weight can change over time. In this example we are only interested in cases where the AMI happened while the patient’s BMI was known to be > 30.
The problem is, we cannot go back in time and measure the BMI of the patient on the day the AMI happened. Maybe someone did measure the patient’s weight on that day, but maybe not. So, what do we mean when we say “known to be obese”?
Maybe the only weight measurement available for that person is from 3 years before, or after, the day of the AMI. Or maybe the AMI happened between two weight measurements, one giving BMI > 30 hence obesity, the other giving BMI < 30 hence no obesity. Which one do you pick?
Here, we will pick whichever body measurements are closest in time to each AMI event.
Say a patient:
…that means that on day 40, the closest weight record available was the one from day 1. That record said that the patient was obese. So, we count that as an AMI that happened in a patient known to be obese.
That would not be the case had the AMI been on day 90, for example. Similarly, if that same patient had a second AMI (god forbid) on day 120, or any day after 60, that second AMI would count in the non-obese group.
Let us first compute body mass index like in another vignette.
# Weight component
# Loinc 29463-7 Body weight, OMOP concept ID 3025315
weight_component <- sqlt(measurement) |>
filter(measurement_concept_id == 3025315) |>
make_component(
.ts = measurement_datetime,
.pid = person_id)
# Height component
# Loinc 8302-2 Body height, OMOP concept ID 3036277
height_component <- sqlt(measurement) |>
filter(measurement_concept_id == 3036277) |>
make_component(
.ts = measurement_datetime,
.pid = person_id)
To which we add a component for acute myocardial infarction (AMI),
from table CONDITION_OCCURRENCE.
# Acute myocardial infarction (AMI) component
# SNOMED 22298006 Myocardial infarction, OMOP concept ID 4329847
ami_component <- sqlt(condition_occurrence) |>
filter(condition_concept_id == 4329847) |>
make_component(
.ts = condition_start_datetime,
.pid = person_id)
Spelling out the BMI formula is straightforward:
bmi = height/(weight * weight)
From which we can write a formula for “is obese”:
is_obese = bmi > 30
Notice that while the first formula is numerical (gives a number),
the second one is logical (gives a TRUE or
FALSE). Both are supported transparently, as long as
they’re not mixed, e.g. 3 / TRUE.
However, doing the same for “has an AMI” may not be as clear. What is the formula for “patient has an AMI event”?
Since we have a column that gives the date when the AMI happened
(it’s called condition_start_datetime), we can check
whether that column is empty. In each patient, for all phenotype
calculations with a timestamp past the timestamp of an AMI event, that
column will be populated by Phea with the date of that AMI event. For
timestamps prior to the patient’s first AMI, the column is empty. If the
patient has a new AMI, the date is updated.
To check if a column is empty in SQL, we use
is not null:
has_ami = ami_condition_start_datetime is not null
Armed with this, we can write the final formula for “has AMI and is obese”:
obese_ami = is_obese AND has_ami
# AMI in obese patient
ami_obese <- calculate_formula(
components = list(
weight = weight_component,
height = height_component,
ami = ami_component),
fml = list(
height_in_meters = 'height_value_as_number / 100',
bmi = 'weight_value_as_number / (height_in_meters * height_in_meters)',
has_bmi = 'bmi is not null',
is_obese = 'bmi > 30',
has_ami = 'ami_condition_start_datetime is not null',
ami_obese = 'has_ami and is_obese'
))
# Keep only patients who had AMI
ami_patients <- ami_obese |>
filter(has_ami) |>
select(pid) |>
distinct() |>
pull()
ami_obese <- ami_obese |>
filter(pid %in% local(ami_patients))
head_shot(ami_obese) |>
kable()
| row_id | pid | ts | window | height_value_as_number | weight_value_as_number | ami_condition_start_datetime | height_in_meters | bmi | has_bmi | is_obese | has_ami | ami_obese |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 8060 | 196 | 1988-09-10 | 00:00:00 | 186 | 94.4 | NA | 1.86 | 27.28639 | TRUE | FALSE | FALSE | FALSE |
| 8081 | 196 | 1989-09-16 | 00:00:00 | 186 | 94.4 | NA | 1.86 | 27.28639 | TRUE | FALSE | FALSE | FALSE |
| 8082 | 196 | 1990-09-22 | 00:00:00 | 186 | 94.4 | NA | 1.86 | 27.28639 | TRUE | FALSE | FALSE | FALSE |
| 8083 | 196 | 1991-09-28 | 00:00:00 | 186 | 94.4 | NA | 1.86 | 27.28639 | TRUE | FALSE | FALSE | FALSE |
| 8084 | 196 | 1992-10-03 | 00:00:00 | 186 | 94.4 | NA | 1.86 | 27.28639 | TRUE | FALSE | FALSE | FALSE |
| 8065 | 196 | 1993-10-09 | 00:00:00 | 186 | 94.4 | NA | 1.86 | 27.28639 | TRUE | FALSE | FALSE | FALSE |
| 8066 | 196 | 1994-10-15 | 00:00:00 | 186 | 94.4 | NA | 1.86 | 27.28639 | TRUE | FALSE | FALSE | FALSE |
| 8067 | 196 | 1995-10-21 | 00:00:00 | 186 | 94.4 | NA | 1.86 | 27.28639 | TRUE | FALSE | FALSE | FALSE |
| 8068 | 196 | 1996-10-26 | 00:00:00 | 186 | 94.4 | NA | 1.86 | 27.28639 | TRUE | FALSE | FALSE | FALSE |
| 8069 | 196 | 1997-11-01 | 00:00:00 | 186 | 94.4 | NA | 1.86 | 27.28639 | TRUE | FALSE | FALSE | FALSE |
Phea will compute ami_obese for all points in time in
each patient. If a patient has a single AMI event, but had his body
weight and height measured on 6 different dates, that is 7 computations
of ami_obese, unless the date of the AMI coincides with one
of the dates of the body measurements.
Since we want to consider only the body measurements made closest in
time to the AMI event, we can filter the rows by the
window.
Phea returns the window alongside each phenotype
calculation. The window is the time distance between the
oldest record, and the most recent one, that went into the phenotype
calculation.
Therefore, in each patient, for each AMI event, whichever row has the
smallest window, that row is using the body measurements
that are closest possible, in time, to the AMI event.
# Keep only the smallest window for each patient:
ami_obese_f <- ami_obese |>
filter(has_ami & has_bmi) |>
keep_row_by('window', c('pid', 'ami_condition_start_datetime'))
head_shot(ami_obese_f) |>
kable()
| row_id | pid | ts | window | height_value_as_number | weight_value_as_number | ami_condition_start_datetime | height_in_meters | bmi | has_bmi | is_obese | has_ami | ami_obese |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 8079 | 196 | 2007-12-22 | 364 days | 186.0 | 94.4 | 2007-12-22 | 1.860 | 27.28639 | TRUE | FALSE | TRUE | FALSE |
| 8361 | 204 | 1975-04-20 | 14 days | 176.1 | 93.2 | 1975-04-20 | 1.761 | 30.05365 | TRUE | TRUE | TRUE | TRUE |
| 10839 | 272 | 1990-12-30 | 105 days | 182.8 | 91.2 | 1990-12-30 | 1.828 | 27.29245 | TRUE | FALSE | TRUE | FALSE |
| 11376 | 286 | 1987-11-04 | 189 days | 168.9 | 53.3 | 1987-11-04 | 1.689 | 18.68392 | TRUE | FALSE | TRUE | FALSE |
| 11757 | 295 | 2018-07-08 | 154 days | 181.0 | 91.8 | 2018-07-08 | 1.810 | 28.02112 | TRUE | FALSE | TRUE | FALSE |
| 12765 | 321 | 2009-01-28 | 112 days | 191.4 | 110.4 | 2009-01-28 | 1.914 | 30.13597 | TRUE | TRUE | TRUE | TRUE |
| 13067 | 328 | 1989-09-28 | 98 days | 162.4 | 55.3 | 1989-09-28 | 1.624 | 20.96781 | TRUE | FALSE | TRUE | FALSE |
| 15489 | 379 | 2005-09-09 | 42 days | 182.8 | 92.1 | 2005-09-09 | 1.828 | 27.56178 | TRUE | FALSE | TRUE | FALSE |
| 15915 | 389 | 2012-03-03 | 147 days | 169.1 | 79.8 | 2012-03-03 | 1.691 | 27.90716 | TRUE | FALSE | TRUE | FALSE |
| 18513 | 452 | 1932-09-15 | 160 days | 187.3 | 75.5 | 1932-09-15 | 1.873 | 21.52144 | TRUE | FALSE | TRUE | FALSE |
ami_obese_patients <- ami_obese |>
filter(ami_obese) |>
select(pid) |>
distinct() |>
pull()
random_patient <- sample(ami_obese_patients, 1)
message('Sampled patient: ', random_patient)
#> Sampled patient: 21720
ami_obese |>
select(
-height_value_as_number,
-ami_condition_start_datetime) |>
phea_plot(pid = random_patient)
#> Collecting lazy table, done.
To see the SQL query underlying the phenotype, use helper function
code_shot(), or dbplyr::sql_render(), or the
.clip_sql option in calculate_formula().
code_shot(ami_obese)
SELECT *
FROM (
SELECT *, has_ami and is_obese AS "ami_obese"
FROM (
SELECT *, ami_condition_start_datetime is not null AS "has_ami"
FROM (
SELECT *, bmi > 30 AS "is_obese"
FROM (
SELECT *, bmi is not null AS "has_bmi"
FROM (
SELECT
*,
weight_value_as_number / (height_in_meters * height_in_meters) AS "bmi"
FROM (
SELECT
"row_id",
"pid",
"ts",
"window",
"height_value_as_number",
"weight_value_as_number",
"ami_condition_start_datetime",
height_value_as_number / 100 AS "height_in_meters"
FROM (
SELECT
*,
"ts" - least(weight_ts, height_ts, ami_ts) AS "window",
last_value(row_id) over (partition by "pid", "ts") AS "ts_row"
FROM (
SELECT
"row_id",
"pid",
"ts",
MAX("weight_value_as_number") OVER (PARTITION BY "pid", "..dbplyr_partion_1") AS "weight_value_as_number",
MAX("weight_ts") OVER (PARTITION BY "pid", "..dbplyr_partion_2") AS "weight_ts",
MAX("height_value_as_number") OVER (PARTITION BY "pid", "..dbplyr_partion_3") AS "height_value_as_number",
MAX("height_ts") OVER (PARTITION BY "pid", "..dbplyr_partion_4") AS "height_ts",
MAX("ami_condition_start_datetime") OVER (PARTITION BY "pid", "..dbplyr_partion_5") AS "ami_condition_start_datetime",
MAX("ami_ts") OVER (PARTITION BY "pid", "..dbplyr_partion_6") AS "ami_ts"
FROM (
SELECT
*,
SUM(CASE WHEN (("weight_value_as_number" IS NULL)) THEN 0 ELSE 1 END) OVER (PARTITION BY "pid" ORDER BY "pid", "ts" ROWS UNBOUNDED PRECEDING) AS "..dbplyr_partion_1",
SUM(CASE WHEN (("weight_ts" IS NULL)) THEN 0 ELSE 1 END) OVER (PARTITION BY "pid" ORDER BY "pid", "ts" ROWS UNBOUNDED PRECEDING) AS "..dbplyr_partion_2",
SUM(CASE WHEN (("height_value_as_number" IS NULL)) THEN 0 ELSE 1 END) OVER (PARTITION BY "pid" ORDER BY "pid", "ts" ROWS UNBOUNDED PRECEDING) AS "..dbplyr_partion_3",
SUM(CASE WHEN (("height_ts" IS NULL)) THEN 0 ELSE 1 END) OVER (PARTITION BY "pid" ORDER BY "pid", "ts" ROWS UNBOUNDED PRECEDING) AS "..dbplyr_partion_4",
SUM(CASE WHEN (("ami_condition_start_datetime" IS NULL)) THEN 0 ELSE 1 END) OVER (PARTITION BY "pid" ORDER BY "pid", "ts" ROWS UNBOUNDED PRECEDING) AS "..dbplyr_partion_5",
SUM(CASE WHEN (("ami_ts" IS NULL)) THEN 0 ELSE 1 END) OVER (PARTITION BY "pid" ORDER BY "pid", "ts" ROWS UNBOUNDED PRECEDING) AS "..dbplyr_partion_6"
FROM (
SELECT
row_number() over () AS "row_id",
"pid",
"ts",
last_value(case when "name" = 'ouzkytg7046r' then "value_as_number" else null end) over (partition by "pid", "name" order by "ts" rows between unbounded preceding and 0 preceding) AS "weight_value_as_number",
last_value(case when "name" = 'ouzkytg7046r' then "ts" else null end) over (partition by "pid", "name" order by "ts" rows between unbounded preceding and 0 preceding) AS "weight_ts",
last_value(case when "name" = 'hvj925mu1sze' then "value_as_number" else null end) over (partition by "pid", "name" order by "ts" rows between unbounded preceding and 0 preceding) AS "height_value_as_number",
last_value(case when "name" = 'hvj925mu1sze' then "ts" else null end) over (partition by "pid", "name" order by "ts" rows between unbounded preceding and 0 preceding) AS "height_ts",
last_value(case when "name" = 'n8t9unwm71hv' then "condition_start_datetime" else null end) over (partition by "pid", "name" order by "ts" rows between unbounded preceding and 0 preceding) AS "ami_condition_start_datetime",
last_value(case when "name" = 'n8t9unwm71hv' then "ts" else null end) over (partition by "pid", "name" order by "ts" rows between unbounded preceding and 0 preceding) AS "ami_ts"
FROM (
(
SELECT *, NULL AS "condition_start_datetime"
FROM (
(
SELECT
'ouzkytg7046r' AS "name",
"person_id" AS "pid",
"measurement_datetime" AS "ts",
"value_as_number"
FROM "cdm_new_york3"."measurement"
WHERE ("measurement_concept_id" = 3025315.0)
)
UNION ALL
(
SELECT
'hvj925mu1sze' AS "name",
"person_id" AS "pid",
"measurement_datetime" AS "ts",
"value_as_number"
FROM "cdm_new_york3"."measurement"
WHERE ("measurement_concept_id" = 3036277.0)
)
) "q01"
)
UNION ALL
(
SELECT
"name",
"pid",
"ts",
NULL AS "value_as_number",
"condition_start_datetime"
FROM (
SELECT
'n8t9unwm71hv' AS "name",
"person_id" AS "pid",
"condition_start_datetime" AS "ts",
"condition_start_datetime"
FROM "cdm_new_york3"."condition_occurrence"
WHERE ("condition_concept_id" = 4329847.0)
) "q02"
)
) "q03"
) "q04"
) "q05"
) "q06"
) "q07"
WHERE ("row_id" = "ts_row")
) "q08"
) "q09"
) "q10"
) "q11"
) "q12"
) "q01"
WHERE ("pid" IN (196, 204, 272, 286, 295, 321, 328, 379, 389, 452, 531, 555, 593, 602, 691, 771, 802, 928, 1055, 1059, 1062, 1232, 1327, 1328, 1463, 1564, 1597, 1678, 1696, 1749, 1926, 1964, 1972, 2082, 2186, 2205, 2213, 2249, 2374, 2488, 2512, 2553, 2599, 2698, 2707, 2714, 3090, 3199, 3210, 3217, 3257, 3316, 3327, 3351, 3359, 3370, 3377, 3406, 3419, 3426, 3450, 3460, 3511, 3558, 3594, 3734, 3829, 3859, 3860, 3937, 3986, 4067, 4087, 4216, 4221, 4246, 4299, 4308, 4396, 4449, 4462, 4488, 4529, 4589, 4742, 4771, 4833, 4850, 4912, 4918, 4995, 5104, 5145, 5147, 5185, 5190, 5221, 5332, 5333, 5342, 5430, 5512, 5565, 5601, 5738, 5744, 5854, 5906, 5916, 5963, 5999, 6021, 6040, 6042, 6099, 6180, 6200, 6223, 6235, 6245, 6268, 6335, 6497, 6520, 6543, 6562, 6617, 6648, 6735, 6748, 6805, 6845, 6957, 7001, 7026, 7377, 7385, 7429, 7469, 7486, 7501, 7522, 7524, 7527, 7559, 7560, 7651, 7677, 7700, 7720, 7738, 7772, 7847, 7888, 7920, 8054, 8086, 8159, 8184, 8218, 8220, 8268, 8278, 8379, 8437, 8475, 8564, 8646, 8668, 8729, 8784, 8786, 8819, 8865, 8889, 9032, 9165, 9251, 9302, 9372, 9410, 9469, 9498, 9517, 9521, 9529, 9571, 9589, 9599, 9644, 9701, 9774, 9797, 9994, 10113, 10249, 10267, 10293, 10316, 10420, 10525, 10630, 10638, 10645, 10679, 10680, 10712, 10733, 10747, 10755, 10809, 10878, 10901, 10938, 11003, 11011, 11026, 11061, 11074, 11123, 11138, 11141, 11149, 11166, 11170, 11199, 11320, 11339, 11374, 11385, 11404, 11466, 11624, 11706, 11789, 11840, 11866, 11872, 11978, 12034, 12053, 12082, 12226, 12282, 12347, 12371, 12395, 12462, 12516, 12601, 12647, 12747, 12852, 12866, 12903, 12931, 13001, 13091, 13149, 13183, 13293, 13339, 13402, 13421, 13458, 13471, 13542, 13647, 13691, 13763, 13789, 13804, 13809, 13814, 13820, 13825, 13856, 13858, 13875, 13903, 13930, 13969, 13980, 13988, 13989, 14021, 14027, 14071, 14098, 14105, 14192, 14201, 14213, 14349, 14410, 14453, 14498, 14500, 14523, 14644, 14704, 14720, 14904, 14909, 15074, 15200, 15251, 15311, 15388, 15398, 15605, 15636, 15659, 15673, 15921, 16087, 16113, 16169, 16208, 16255, 16271, 16397, 16433, 16452, 16458, 16522, 16580, 16641, 16646, 16698, 16703, 16716, 16770, 16774, 16803, 16857, 16859, 16871, 16893, 17020, 17023, 17047, 17093, 17254, 17373, 17561, 17586, 17625, 17629, 17747, 17775, 17809, 17834, 17838, 17946, 18049, 18100, 18164, 18227, 18309, 18344, 18356, 18439, 18528, 18602, 18603, 18626, 18646, 18741, 18850, 18856, 18878, 18879, 18902, 19150, 19153, 19156, 19244, 19247, 19301, 19339, 19464, 19522, 19527, 19528, 19668, 19700, 19724, 19802, 19815, 19889, 19907, 19919, 19949, 20002, 20032, 20060, 20075, 20136, 20167, 20173, 20179, 20209, 20223, 20267, 20310, 20340, 20359, 20411, 20424, 20432, 20523, 20527, 20536, 20592, 20667, 20691, 20719, 20747, 20827, 20935, 21075, 21183, 21243, 21270, 21273, 21278, 21289, 21304, 21376, 21422, 21550, 21563, 21673, 21677, 21708, 21720, 21806, 21809, 21861, 21936, 21939, 21966, 22012, 22121, 22134, 22148, 22156, 22157, 22180, 22242, 22460, 22499, 22506, 22577, 22657, 22677, 22680, 22872, 22884, 22945, 22961, 22964, 23053, 23062, 23064))